Tag
26 articles
Learn how to implement a reinforcement learning framework that encourages multimodal AI models to request help when visual information is missing, improving accuracy and reliability.
Peter Bailis, former CTO of Workday, has left his executive role to join Anthropic as a technical staff member, focusing on reinforcement learning engineering.
Alibaba's Qwen team introduces a novel algorithm that enhances AI reasoning by weighting each step based on its influence on subsequent decisions, effectively doubling the length of thought processes in AI models.
Learn how to work with compact language models like Liquid AI's LFM2.5-350M by setting up environments, loading models, performing inference, and understanding reinforcement learning integration.
Explore the significance of Hugging Face's TRL v1.0, a unified framework for aligning large language models through post-training techniques like SFT, Reward Modeling, DPO, and GRPO.
This article explains the advanced AI technologies behind Amazon's real-time deal optimization during the Spring Sale, including reinforcement learning, time-series forecasting, and multi-armed bandit algorithms.
This article explains how Amazon's Spring Sale leverages advanced AI systems including reinforcement learning, neural collaborative filtering, and real-time data processing to optimize pricing and personalization.
This article explains hyperagents, advanced AI systems that can improve both their task performance and their own learning mechanisms. It explores how these self-improving systems work and why they represent a significant advancement in artificial intelligence.
Learn how NVIDIA's ProRL Agent uses a new approach to train AI systems for complex, multi-turn conversations. This breakthrough could make AI assistants much more helpful for real-world tasks.
Learn how NVIDIA's new PivotRL framework improves AI training efficiency by combining supervised learning and reinforcement learning techniques to achieve better performance with fewer attempts.
Learn to build a Deep Q-Network (DQN) reinforcement learning agent from scratch using JAX, RLax, Haiku, and Optax to solve the CartPole environment.
This article explains how OpenAI's new model selection system works in ChatGPT, detailing the technical mechanisms behind dynamic model routing and its significance for AI deployment strategies.